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Abstract —We propose a new gradient-domain technique for 
processing registered EM image stacks to remove inter-image 
discontinuities while preserving intra-image detail. To this end, 
we process the image stack by first performing anisotropic 
smoothing along the slice axis and then solving a Poisson equation 
within each slice to re-introduce the detail. The final image stack 
is continuous across the slice axis and maintains sharp details 
within each slice. Adapting existing out-of-core techniques for 
solving the linear system, we describe a parallel algorithm with 
time complexity that is linear in the size of the data and space 
complexity that is sub-linear, allowing us to process datasets as 
large as five teravoxels with a 600 MB memory footprint. 

Index Terms —Gradient Domain, Image Processing, Image 
Fusion 


I. Introduction 

Recent innovation and automation of electron microscopy 
sectioning has made it possible to obtain high-resolution im¬ 
age stacks capturing the relationships between cellular struc¬ 
tures H]. This, in turn, has motivated research in areas such as 
connectomics El, a, 01, a which aims to gain insight into 
neural function through the study of the connectivity network. 

While the technological advances in acquisition and reg¬ 
istration have made it possible to acquire unprecedentedly 
large micron-resolution volumes, the acquisition process itself 
introduces undesirable artifacts in the data, complicating tasks 
of (semi-)automatic anatomy tracking. Specifically, since the 
individual slices in the stack are imaged independently, discon¬ 
tinuities often arise between successive slices due to variations 
in lighting, camera parameters, and the physical manner in 
which a slice is positioned on the slide. An example of these 
artifacts can be seen in Figure (top-left), which shows an 
image of the same column taken from successive images in 
a stack (1850 images at a resolution of 21,504 x 26,624) 
imaging a mouse cortex 0. The visualization highlights the 
local discontinuities (thin vertical stripes across the image) that 
can arise due to the acquisition process. 

In this work we propose a new gradient-domain technique 
for processing these anatomical volumes to remove the un¬ 
desired artifacts. The processing consists of two phases. In 
the first phase, we perform anisotropic smoothing across the 
slice axis to smooth out the discontinuities between these 
slices. As Figure [T] (center) shows, this has the desirable 
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Fig. 1. Cross-sections of an EM stack showing zy-shces through the data 
(top) and xy-slicQS through the data (bottom). The image on the left is taken 
from the original data, the image in the center is the result of the initial 
anisotropic smoothing step, and the image on the right is the subsequent 
solution of the screened-Poisson equation. 


effect of removing the discontinuities (top) but it also smooths 
out the anatomical features within the slice (bottom). To 
address this, we perform a second step of gradient-domain 
processing on each slice independently, solving a screened- 
Poisson equation to generate a new voxel grid with low- 
frequency content taken from the anisotropically smoothed 
grid and high-frequency content taken from the original data. 
As Figure [T] (right) shows, this combines the best parts of 
both datasets - like the anisotropically smoothed grid, this 
solution does not exhibit discontinuities between slices (top), 
while simultaneously preserving the sharp detail present in the 
original data (bottom). 

Our implementation of the gradient-domain processing is 
enabled by adapting an existing, out-of-core Poisson solver Q 
to support the frequency-based merging of two images. As a 
result, our implementation is parallelizable, has linear time 
complexity, and sub-linear space complexity, supporting the 
efficient processing of truly large datasets. 

II. Related Work 

Over the last decade, gradient-domain approaches have 
gained prevalence in image processing 0. Examples include 
removal of light and shadow effects a, Eoi, reduction of 
dynamic range ini, El, creation of intrinsic images na, 
image stitching Gl, El, El, removal of reflections El, 
and gradient-based sharpening El- 
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The versatility of gradient-domain processing has led to 
the design of numerous methods for solving the underlying 
Poisson problem in the context of large 2D images, including 
adaptive mi, and hierarchical Eoi, im solvers. 

In this work we show that (1) the problem of removing 
inter-slice discontinuities in EM images can be reduced to the 
problem of performing frequency-based fusion of individual 
image slices, and (2) solving the the frequency-based fusion 
problem amounts to solving a 2D Poisson equation. This 
allows us to leverage efficient, out-of-core, and distributed 
Poisson solvers to process huge EM images. 

This extends our earlier (publicly available but unpublished) 
research ll22]| by replacing the computationally expensive 3D 
Poisson solver with a simple blurring operator. 

III. EM Image Processing 

The goal of our EM image processing is to remove the 
artifacts that arise due to the independent imaging of the slices 
in the 3D volume. In practice, the independent imaging results 
in a 3D volume that does not exhibit obvious artifacts when 
xy slices are considered individually, but exhibits distract¬ 
ing “popping” artifacts when viewed across the slice axis. 
Considered in the frequency domain, the artifacts are low- 
frequency in the xy-direction (hence unnoticed when slices are 
considered individually) but high-frequency in the z-direction 
(hence the unwanted “popping”). 

If, for simplicity, we consider a 3D image as comprised of 
four components: 

1) LL: low xy frequency and low z frequency, 

2) HL: high xy frequency and low z frequency, 

3) LH: low xy frequency and high z frequency, and 

4) HH: high xy frequency and high z frequency. 

the goal is to generate the signal where the LH component is 
removed. We address this in two steps, first, we smooth across 
the slice direction to remove the LH and HH components and 
then we fuse back in the high-frequency content within each 
slice to get back the HH component. 


A. Anisotropic Smoothing 

The smoothing of the image data is performed by convolv¬ 
ing the 3D image with an anisotropic Gaussian, aligned with 
the coordinate axes, which has low variance in the xy-plane 
and larger variance in the z-direction: 

/I = /o * G,, „ 

G* xy iG' z 


with Ga-^y,cr^ the anisotropic Gaussian: 












(bottom). This motivates a second processing stage in which 
we generate the slices of the new 3D image, /^, by fusing the 
low-frequency data from the slices of and high-frequency 
data from the slices of 7°. 

Conceptually, this can be implemented by iterating through 
the slices of 7° and 7^ and performing a frequency-space blend 
to generate the slices of 7^, giving less weight to the data from 
7° at lower frequencies and more weight at higher frequencies. 
As we show below, this can be formulated as a set of 2D 
gradient-domain problems where we seek a 3D image whose 
j-th slice minimizes: 

7| = arg min f a (7 — 7j)^-f || V7 — V7°||^ dp. 

Here ilj is the slice domain and a is the screening weight 
balancing the importance of interpolating pixel values of ij 
with the goal of matching the gradients of 7?. Using the Euler- 
Lagrange formulation, the minimizer is obtained by solving 
the linear (Screened-Poisson) system: 

{a - A)I] = alj - AI° (1) 

for each slice j. 

As visualized in Eigure[T] (right), the second step of process¬ 
ing re-introduces the HH component, providing a 3D image 
that has the sharp intra-slice detail of the input, 7°, without 
the inter-slice discontinuities. (More precisely, this step fuses 
back in the HL and HH components of 7°, but since the HL 
component is already in 7^, only the HH component changes.) 

Frequency-Space Interpretation 

As in the work of Bhat et al. ca, we get a frequency-space 
interpretation of Screened-Poisson blending by considering the 
system in the Eourier domain. Specifically, expressing the 2D 
slices 7” and ij in terms of their frequency decomposition: 

I°{x,y) = 

k,l 

k,l 

and using the fact that the complex exponential 

is an eigenvector of the Laplace operator with eigenvalue 

—471^ -\-P) we get: 

^ a + 47r2(fc2 + p) 

Thus, the (fc, ()-th Eourier coefficient of 7j is a weighted 
average of the (7, ()-th coefficients of 7? and ij, with the co¬ 
efficient of Ij receiving higher weight when the interpolation 
weight (a) is small or the frequency (k^ -f P) is large. 


B. Screened-Poisson Blending 

On its own, anisotropic smoothing dampens both the LH 
and HH components, removing desirable high-frequency con¬ 
tent within a slice. This is visualized in Eigure[T] (middle) - the 
smoothing effectively removes the inter-slice discontinuities 
(top), but it also blends out the details within each image 


Implementation 

An advantage of our formulation is that it provides a simple 
and scalable solution for EM image processing. 

The first step of our processing, anisotropic smoothing, is 
implemented by using a compactly supported approximation 
of the Gaussian (setting the weight to zero beyond three 
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Standard deviations). Because of the locality of the smoothing, 
this step can be implemented in a streaming fashion by 
maintaining a small window on the image in working memory, 
computing the weighted average for the in-core subset of the 
image (in parallel), and then advancing the window. This step 
has time complexity that is linear in the number of voxels in 
the 3D image and space complexity that only depends on the 
variances of the Gaussian, a^y and cr^. 

The second step of our processing, solving the Poisson 
equation, is implemented by adapting the parallel/distributed 
and out-of-core solver of Kazhdan et al. El]. We modihed 
the implementation in El to allow a user to input both a low- 
and a high-frequency image, setting up the constraints for the 
linear system as in Equation [T] and then using the existing 
multigrid solver to obtain the solution. Each solve has time 
complexity that is linear in the number of pixels in a slice and 
space complexity that is linear in the size of the row. 

Thus, for a 3D image with 0{N^) voxels, our processing 
has time-complexity 0{N^), space complexity 0{N), and can 
be implemented in parallel. Eurthermore, our implementation 
depends on only three parameters - the variances of the 
Gaussian used for anisotropic smoothing, axy and CTj, and 
the interpolation weight used for frequency-based fusing, a. 

Distributed Processing: While our evaluations are per¬ 
formed on a single machine, our approach is also trivial to 
distribute. In the hrst phase we can leverage the fact that we 
use a compactly supported approximation of the Gaussian for 
smoothing, so computing the values in slice Ij only requires 
knowing the values in slices 1°, with — /c| < 3 • (T 2 . In the 
second phase the 2D screened-Poisson equation is solved for 
each slice independently. Thus, the processing of N slices can 
be distributed across M machines by copying N/M -f 6 • cr^ 
slices to each machine, and then computing the smoothed (J^) 
and fused (/^) values for the N/M interior slices. 


Alternate Solutions 

In addressing the color correction problem, we considered 
two other implementations. 

In our initial research E2l we implemented the anisotropic 
smoothing by solving a gradient-domain problem in which 
the target gradient held was dehned by zeroing out the z- 
components of the gradients of the input. We have opted 
for the simpler Gaussian convolution presented in this work 
because it is signihcantly faster in practice and has space 
complexity that depends on the size of the blurring kernel. In 
contrast, the gradient-domain implementation of anisotropic 
diffusion requires the implementation of a streaming 3D 
Poisson solver and has space complexity proportional to the 
size of an image slice. 

We had also considered implementing the Poisson solver 
using the Past Pourier Transform El, El- However, we did 
not pursue this approach because the theoretical complexity of 
the PET is slower than that of multigrid (log-linear vs. linear) 
and a scalable implementation requires an out-of-core trans¬ 
pose (often causing an I/O bottleneck for computation El)- 


Extensions and Applications 

An advantage of our formulation is that our decomposition 
of the computation into two simple steps makes it easy to 
adapt the processing. As an example, Pigure demonstrates 
how robust averaging can be incorporated to mitigate the 
effects of bad data by showing three successive slices in 
a volume, where the middle slice of the input is missing 
data (top row). Using naive smoothing distributes the corrupt 
data into adjacent low-frequency slices (second row) which 
then introduces artifacts in the output, visible as a darker 
band in the lower part of the neighboring slices (third row). 
Instead, by only averaging color values within two standard 
deviations, we obtain robust low-frequency slices (fourth row) 
that avoid introducing artifacts into adjacent output slices and 
do not introduce a low-frequency solution into the region with 
missing data (bottom row). 

Additionally, though this work focuses on the processing 
of EM image stacks, we believe that our formulation of 
frequency-based fusion in the gradient domain contributes an 
general-purpose tool for image processing. As an example, 
Pigure shows an application of fusing a low-frequency 
color image with a high-frequency monochromatic image. 
Using the color image to dehne the value constraints and the 
monochromatic image to dehne the gradient constraints, the 
solution to the screened-Poisson equation provides a result that 
has both color and high frequency detaillH 

IV. Results 

To evaluate our approach, we consider both the quality of 
the image and run-time performance. In these evaluations, we 
hx the radius of the in-slice blurring kernel to — 1, the 
cross-slice bluning kernel to <7^ = 3 (both measured in pixels), 
and the screening weight to a = 0.001. 

A. Comparison to EMISAC 

We compare our method to the approach of Azadi et 
al. ESI (EMISAC). Similar to our original gradient-domain 
formulation Ea, EMISAC performs the editing by solving 
for the 3D image whose partial derivatives in the x- and 
y-directions match those of the input, and whose partials 
along the z-direction are close to zero. Unlike our earlier 
approach, EMISAC formulates the hltering as a solution to a 
constrained quadratic optimization problem, requiring the use 
of the computationally more expensive L-BPGS-B solver ETl . 

We evaluate the two methods on a set of cut-outs from the 
“Kasthurill” dataset a of a mouse cortex. 

Pigure compares the visual quality of the processed 
images which shows an a:?/-slice from the 1024 x 1024 x 100 
dataset. The image on the left shows a slice from the input, the 
one in the middle shows the corresponding slice after EMISAC 
hltering, and the one on the right shows the results of our 
approach. As can be seen from the hgure, the EMISAC result 
has less contrast than the input and exhibits blocking artifacts. 

Table [I] compares the performance of the two approaches on 
the cut-outs, obtained on a machine with two Xeon E5630 @ 

* To define the gradient constraints, we obtained an RGB image by copying 
the gray-scale value into each of the three color channels. 
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Fig. 3. Frequency-based fusion for merging high-frequency monochrome 
images (top left) with low-frequency color images (bottom left). By using the 
color image to define the low frequency components and the monochrome for 
the high-frequency, we obtain a new image that successfully incoiporates the 
color and detail from both (right). 


Fig. 2. Comparison of naive (middle) and robust (bottom) averaging for 
successive slices {j — 1, i, J -F1} from a 3D volume, showing the input data, 
7®, the output from the smoothing phase, I^, and the result of frequency-based 
fusion, 72 . By ignoring outlying color values when computing the weighted 
average across slices, we obtain a target low-frequency signal that mitigates 
the effects of bad data. 


a significantly more efficient implementation, with running 
times growing linearly with data size and in-core memory 
usage growing sub-linearly. 


2.53 GHz processors and 64 GB of RAM, parallelized across 
eight threads. As the table shows, EMISAC’s dependence on 
the L-BFGS-B solver comes at a noticeable increase in running 
time and memory. In the implementation of EMISAC, this is 
ameliorated by decomposing the 3D volume into cells and 
solving the problem on each cell independently. However, 
replacing a global system with a set of local systems can 
introduce inaccuracies and may be the cause of the blocking 
artifacts noted above. 

In contrast, the simplicity of our formulation results in 


B. Scalability 

To evaluate the scalability of our method, we also evaluated 
the performance on two large datasets. The datasets included 
the complete 1.2 teravoxel “Kasthurill” dataset as well as the 
5.2 teravoxel “Cardona” dataset from the Open Connectome 
Project ll28l . 

The performance of our approach on these two datasets is 
shown in Table [II] obtained on a machine with two Xeon 
X5690 @ 3.47 GHz processors and 48 GB of RAM, par¬ 
allelized across twelve threads. As the table shows, despite 
the large size of the datasets, the in-core memory usage 
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Input EMISAC Ours 


Fig. 4. Visual comparison of our method with EMISAC: Showing a 
slice from the input (left), the corresponding slice in the results of EMISAC 
(center), and the corresponding slice in our results (right). 

remains negligible, due to the out-of-core implementation of 
the smoothing and blending phases. The table also confirms 
that the running time scales linearly with the resolution. (By 
comparison, when run on the “Kasthurill” dataset, the hrst 
phase of our original gradient-domain implementation jza 
took 41 hours and used 42 gigabytes of memory on the same 
machine.) 

Combined with a parallelizable implementation, our ap¬ 
proach provides a solution for processing registered EM image 
stacks that scales to the resolution of today’s large datasets. 

V. Conclusion 

This work analyzes the artifacts commonly arising in EM 
stacks due to the independent imaging of the slices. We 
propose a simple approach that hrst smooths the image along 
the slice axis and then performs frequency-based fusion to 
obtain an image that maintains the sharp detail within the slices 
while removing the discontinuity artifacts across them. We 
describe an extension of the well-established gradient-domain 
processing paradigm that implements the fusion by solving 
a Poisson equation, thereby providing a scalable parallel 
solution that has linear time and sublinear space. Einally, we 
demonstrate the effectiveness of the approach in processing 
images as large as hve teravoxels using less than a gigabyte 
of working memory. 
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EMISAC 

Ours 

Peak (MB) 

Time (h:mm:ss) 

Peak (MB) 
Smooth / Blend 

Time (h:mm:ss) 
Smooth + Blend 

1024 X 1024 X 100 

6,577 

0:13:59 

59/ 

39 

0:00:24 -F 0:00:55 

1024 X 1024 X 200 

12,988 

0:37:47 

63 / 

38 

0:00:37 -F 0:01:51 

2048 X 1024 X 200 

26,089 

1:16:59 

68 / 

34 

0:01:23 -F 0:01:53 

2048 X 2048 x 200 

52,965 

3:25:08 

80 / 

55 

0:02:38 -F 0:02:45 

2048 X 2048 x 400 

* 

* 

105 / 

57 

0:05:18 -F 0:05:26 

4096 X 2048 x 400 

* 

* 

150 / 

66 

0:11:09 -F 0:08:00 

4096 X 4096 x 400 

* 

* 

202 / 

123 

0:21:22 -F 0:13:07 


TABLE I 

Performance comparison of our method with EMISAC: Showing peak memory usage and running time for processing dieferent 

SIZED CUTOUTS FROM THE KASTHURII 1 DATASET. PEAK MEMORY USAGE AND RUNNING TIME ARE PROVIDED SEPARATELY FOR THE SMOOTHING AND 
BLENDING PHASES OF OUR PROCESSING. (*AT RESOLUTIONS HNER THAN 2048 X 2048 X 200 THE EMISAC IMPLEMENTATION RAN OUT OF MEMORY 

AND COULD NOT COMPLETE.) 



Resolution 

Peak (MB) 
Smooth / Blend 

Time (h:mm:ss) 
Smooth + Blend 

Kasthuril 1 
Cardona 

21,504 X 26,624 x 1850 
32,768 X 32,768 x 4840 

369 / 480 

593 / 440 

23:34:17 -F 13:52:30 
109:13:50 -F 69:22:24 


TABLE II 

Performance of our method on two teravoxel datasets: Peak memory usage and running time are provided separately for the 

SMOOTHING AND BLENDING PHASES OF OUR PROCESSING. 



